312 research outputs found

    Multichronic complexity in second language development

    Get PDF
    Taking a dynamic systems perspective on second language development, this paper argues that development is change over time, which is never stable and has no end state. Moreover, time can be defined at different scales: from the millisecond, minute, week and year to the lifespan. At all scales we can see change over time in language development at different levels of granularity; however, the time scale and level of granularity we use  determines to a great extent what we find. What seems a change at one level may be nothing more than natural variation at another one.Keywords: Multichronic complexity, language development, variation, time scale

    Pelestarian Lingkungan Hidup: suatu Kajian Berdasarkan Pendidikan Kependudukan Danlingkungan Hidup (Pklh) di Beberapa Sekolah Dasar

    Get PDF
    The objective of this research aimed to know about environment conservation in connection with the educational program of population and environment (PKLH) in elementary school (SD), in which or especially located around the Tondano lake, Minahasa, North Sulawesi. The qualitative-descriptive approach was conducted on, during February 2014 at 8 schools. Its focused to headmasters, teachers, students, and then continued by observation of schools and its environment. By using interview and participant-observation techniques, researcher take an active role on the learning activities and direct interaction with students in the classroom. After that continuing to search and observing students activities after and out school. The result of this research shows that PKLH was conducted in SD by using integrative approach, and we find that students get more information and knowledge and have an attitude and proper behavior rationally and responsibly according to their ability and educational level

    Designing health websites based on users' web-based information-seeking behaviors: A mixed-method observational study

    Get PDF
    BACKGROUND: Laypeople increasingly use the Internet as a source of health information, but finding and discovering the right information remains problematic. These issues are partially due to the mismatch between the design of consumer health websites and the needs of health information seekers, particularly the lack of support for “exploring” health information. OBJECTIVE: The aim of this research was to create a design for consumer health websites by supporting different health information–seeking behaviors. We created a website called Better Health Explorer with the new design. Through the evaluation of this new design, we derive design implications for future implementations. METHODS: Better Health Explorer was designed using a user-centered approach. The design was implemented and assessed through a laboratory-based observational study. Participants tried to use Better Health Explorer and another live health website. Both websites contained the same content. A mixed-method approach was adopted to analyze multiple types of data collected in the experiment, including screen recordings, activity logs, Web browsing histories, and audiotaped interviews. RESULTS: Overall, 31 participants took part in the observational study. Our new design showed a positive result for improving the experience of health information seeking, by providing a wide range of information and an engaging environment. The results showed better knowledge acquisition, a higher number of page reads, and more query reformulations in both focused and exploratory search tasks. In addition, participants spent more time to discover health information with our design in exploratory search tasks, indicating higher engagement with the website. Finally, we identify 4 design considerations for designing consumer health websites and health information–seeking apps: (1) providing a dynamic information scope; (2) supporting serendipity; (3) considering trust implications; and (4) enhancing interactivity. CONCLUSIONS: Better Health Explorer provides strong support for the heterogeneous and shifting behaviors of health information seekers and eases the health information–seeking process. Our findings show the importance of understanding different health information–seeking behaviors and highlight the implications for designers of consumer health websites and health information–seeking apps

    A UIMA wrapper for the NCBO annotator

    Get PDF
    Summary: The Unstructured Information Management Architecture (UIMA) framework and web services are emerging as useful tools for integrating biomedical text mining tools. This note describes our work, which wraps the National Center for Biomedical Ontology (NCBO) Annotator—an ontology-based annotation service—to make it available as a component in UIMA workflows

    The textual characteristics of traditional and Open Access scientific journals are similar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent years have seen an increased amount of natural language processing (NLP) work on full text biomedical journal publications. Much of this work is done with Open Access journal articles. Such work assumes that Open Access articles are representative of biomedical publications in general and that methods developed for analysis of Open Access full text publications will generalize to the biomedical literature as a whole. If this assumption is wrong, the cost to the community will be large, including not just wasted resources, but also flawed science. This paper examines that assumption.</p> <p>Results</p> <p>We collected two sets of documents, one consisting only of Open Access publications and the other consisting only of traditional journal publications. We examined them for differences in surface linguistic structures that have obvious consequences for the ease or difficulty of natural language processing and for differences in semantic content as reflected in lexical items. Regarding surface linguistic structures, we examined the incidence of conjunctions, negation, passives, and pronominal anaphora, and found that the two collections did not differ. We also examined the distribution of sentence lengths and found that both collections were characterized by the same mode. Regarding lexical items, we found that the Kullback-Leibler divergence between the two collections was low, and was lower than the divergence between either collection and a reference corpus. Where small differences did exist, log likelihood analysis showed that they were primarily in the area of formatting and in specific named entities.</p> <p>Conclusion</p> <p>We did not find structural or semantic differences between the Open Access and traditional journal collections.</p

    Supervised learning for detection of duplicates in genomic sequence databases

    Get PDF
    Motivation First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. Results We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from metadata, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material

    Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT

    Get PDF
    Protein-protein interactions (PPIs) are critical to normal cellular function and are related to many disease pathways. A range of protein functions are mediated and regulated by protein interactions through post-translational modifications (PTM). However, only 4% of PPIs are annotated with PTMs in biological knowledge databases such as IntAct, mainly performed through manual curation, which is neither time- nor cost-effective. Here we aim to facilitate annotation by extracting PPIs along with their pairwise PTM from the literature by using distantly supervised training data using deep learning to aid human curation. Method We use the IntAct PPI database to create a distant supervised dataset annotated with interacting protein pairs, their corresponding PTM type, and associated abstracts from the PubMed database. We train an ensemble of BioBERT models-dubbed PPI-BioBERT-x10-to improve confidence calibration. We extend the use of ensemble average confidence approach with confidence variation to counteract the effects of class imbalance to extract high confidence predictions. Results and conclusion The PPI-BioBERT-x10 model evaluated on the test set resulted in a modest F1-micro 41.3 (P =5 8.1, R = 32.1). However, by combining high confidence and low variation to identify high quality predictions, tuning the predictions for precision, we retained 19% of the test predictions with 100% precision. We evaluated PPI-BioBERT-x10 on 18 million PubMed abstracts and extracted 1.6 million (546507 unique PTM-PPI triplets) PTM-PPI predictions, and filter [Formula: see text] (4584 unique) high confidence predictions. Of the 5700, human evaluation on a small randomly sampled subset shows that the precision drops to 33.7% despite confidence calibration and highlights the challenges of generalisability beyond the test set even with confidence calibration. We circumvent the problem by only including predictions associated with multiple papers, improving the precision to 58.8%. In this work, we highlight the benefits and challenges of deep learning-based text mining in practice, and the need for increased emphasis on confidence calibration to facilitate human curation efforts.Aparna Elangovan, Yuan Li, Douglas E. V. Pires, Melissa J. Davis, and Karin Verspoo

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
    corecore